Cross-lingual Word Sense Disambiguation: Documentation on Data and Evaluation

نویسندگان

  • Els Lefever
  • Veronique Hoste
چکیده

We propose a multilingual unsupervised Word Sense Disambiguation (WSD) task for a sample of English nouns. Instead of providing manually sense-tagged examples for each sense of a polysemous noun, our sense inventory is built up on the basis of the Europarl parallel corpus. The multilingual setup involves the translations of a given English polysemous noun in five supported languages, viz. Dutch, French, German, Spanish and Italian. Organizing this task consists in: (a) the manual creation of a multilingual sense inventory for a lexical sample of English nouns and (b) the evaluation of systems on their ability to disambiguate new occurrences of the selected polysemous nouns. For the creation of the hand-tagged gold standard, all translations of a given polysemous English noun are retrieved in the five languages and clustered by meaning. There are two types of scoring:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Standard Test Collection for English-Persian Cross-Lingual Word Sense Disambiguation

In this paper, we address the shortage of evaluation benchmarks on Persian (Farsi) language by creating and making available a new benchmark for English to Persian Cross Lingual Word Sense Disambiguation (CL-WSD). In creating the benchmark, we follow the format of the SemEval 2013 CL-WSD task, such that the introduced tools of the task can also be applied on the benchmark. In fact, the new benc...

متن کامل

Addressing Cross-Lingual Word Sense Disambiguation on Low-Density Languages: Application to Persian

We explore the use of unsupervised methods in Cross-Lingual Word Sense Disambiguation (CL-WSD) with the application of English to Persian. Our proposed approach targets the languages with scarce resources (low-density) by exploiting word embedding and semantic similarity of the words in context. We evaluate the approach on a recent evaluation benchmark and compare it with the state-of-the-art u...

متن کامل

LIMSI : Cross-lingual Word Sense Disambiguation using Translation Sense Clustering

We describe the LIMSI system for the SemEval-2013 Cross-lingual Word Sense Disambiguation (CLWSD) task. Word senses are represented by means of translation clusters in different languages built by a cross-lingual Word Sense Induction (WSI) method. Our CLWSD classifier exploits the WSI output for selecting appropriate translations for target words in context. We present the design of the system ...

متن کامل

UvT-WSD1: A Cross-Lingual Word Sense Disambiguation System

This paper describes the Cross-Lingual Word Sense Disambiguation system UvTWSD1, developed at Tilburg University, for participation in two SemEval-2 tasks: the Cross-Lingual Word Sense Disambiguation task and the Cross-Lingual Lexical Substitution task. The UvT-WSD1 system makes use of k-nearest neighbour classifiers, in the form of single-word experts for each target word to be disambiguated. ...

متن کامل

Measuring the Adequacy of Cross-Lingual Paraphrases in a Machine Translation Setting

Following the growing trend in the semantics community towards models adapted to specific applications, the SemEval-2 Cross-Lingual Lexical Substitution and Word Sense Disambiguation tasks address the disambiguation needs of Machine Translation (MT). The experiments conducted in this study aim at assessing whether the proposed evaluation protocol and methodology provide a fair estimate of the a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009